Embracing data abundance: BookTest Dataset for Reading Comprehension
نویسندگان
چکیده
There is a practically unlimited amount of natural language data available. Still, recent work in text comprehension has focused on datasets which are small relative to current computing possibilities. This article is making a case for the community to move to larger data and as a step in that direction it is proposing the BookTest, a new dataset similar to the popular Children’s Book Test (CBT), however more than 60 times larger. We show that training on the new data improves the accuracy of our Attention-Sum Reader model on the original CBT test data by a much larger margin than many recent attempts to improve the model architecture. On one version of the dataset our ensemble even exceeds the human baseline provided by Facebook. We then show in our own human study that there is still space for further improvement.
منابع مشابه
Embracing Data Abundance
There is a practically unlimited amount of natural language data available. Still, recent work in text comprehension has focused on datasets which are small relative to current computing possibilities. This article is making a case for the community to move to larger data and is offering the BookTest dataset as a step in that direction.
متن کاملDataset for the First Evaluation on Chinese Machine Reading Comprehension
Machine Reading Comprehension (MRC) has become enormously popular recently and has attracted a lot of attentions. However, existing reading comprehension datasets are mostly in English. To add diversity in reading comprehension datasets, in this paper we propose a new Chinese reading comprehension dataset for accelerating related research in the community. The proposed dataset contains two diff...
متن کاملPsycholinguistic Ambiance of Short Stories in Enhancing Students’ Reading Comprehension and Vocabulary Power
Abstract The present study was carried out to investigate the effect of short stories on students’ reading comprehension, vocabulary power and attitude towards the skill and the new instructional materials. The participants of the study were 120 grade 9 students of Dilla Secondary and preparatory school. In order to gather data for the study, pre- and posttest of reading comprehension, pre and ...
متن کاملPostgraduate English Students’ Metacognitive Awareness of Reading Strategies and Their Reading Comprehension: A Comparative Study
A fundamental necessity at postgraduate level is a developed strategic reading skill that permits digesting tremendous amounts of technical academic content. The need is more paramount for EFL contexts and postgraduate students majoring in English Language Teaching (ELT) and English Literature (EL) most of whom will ultimately search a career in teaching. The aim of the present ex-post facto s...
متن کاملIranian EFL Learners L2 Reading Comprehension: The Effect of Online Annotations via Interactive White Boards
This study explores the effect of online annotations via Interactive White Boards (IWBs) on reading comprehension of Iranian EFL learners. To this aim, 60 students from a language institute were selected as homogeneous based on their performance on Oxford Placement Test (2014).Then, they were randomly assigned to 3 experimental groups of 20, and subsequently exposed to the research treatment af...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1610.00956 شماره
صفحات -
تاریخ انتشار 2016